Expressing Objects Just Like Words: Recurrent Visual Embedding for Image-Text Matching
نویسندگان
چکیده
منابع مشابه
Distribution-Matching Embedding for Visual Domain Adaptation
Domain-invariant representations are key to addressing the domain shift problem where the training and test examples follow different distributions. Existing techniques that have attempted to match the distributions of the source and target domains typically compare these distributions in the original feature space. This space, however, may not be directly suitable for such a comparison, since ...
متن کاملMemeSequencer: Sparse Matching for Embedding Image Macros
The analysis of the creation, mutation, and propagation of social media content on the Internet is an essential problem in computational social science, affecting areas ranging from marketing to political mobilization. A first step towards understanding the evolution of images online is the analysis of rapidly modifying and propagating memetic imagery or ‘memes’. However, a pitfall in proceedin...
متن کاملConditional Image-Text Embedding Networks
This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies th...
متن کاملText Embedding with Advanced Recurrent Neural Model
Embedding method has become a popular way to handle unstructured data, such as word and text. Word embedding, providing computational-friendly representations for word similarity, is almost be one of the standard solutions for various text mining tasks. Lots of recent studies focusing on word embedding try to generate a more comprehensive representation for each word that incorporating task-spe...
متن کاملStacked Cross Attention for Image-Text Matching
In this paper, we study the problem of image-text matching. Inferring the latent semantic alignment between objects or other salient stuffs (e.g. snow, sky, lawn) and the corresponding words in sentences allows to capture fine-grained interplay between vision and language, and makes image-text matching more interpretable. Prior works either simply aggregate the similarity of all possible pairs ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the AAAI Conference on Artificial Intelligence
سال: 2020
ISSN: 2374-3468,2159-5399
DOI: 10.1609/aaai.v34i07.6631